Measuring  the  Performance  of  Automated  Planning  Systems 


Dana  Nau 

Department  of  Computer  Science 
and  Institute  for  Systems  Research 
University  of  Maryland, 
College  Park,  MD  20742,  USA 
email:  nau@cs.umd.edu 


Malik  Ghallab 

LAAS-CNRS 

7,  Avenue  du  Colonel  Roche 
31077,  Toulouse,  cedex,  France 
email:  Malik.Ghallab@laas.fr 


Abstract 

In  this  paper,  we  describe  existing  performance  measures  for  au¬ 
tomated  planning  algorithms,  and  discuss  the  limitations  and  biases 
inherent  in  those  performance  measures.  We  point  out  the  importance 
of  developing  a  performance  measure  that  explicitly  the  restrictive  as¬ 
sumptions  on  which  a  planning  algorithm  depends,  and  we  propose  a 
composite  performance  measure  based  on  three  factors: 

•  the  scope  of  the  planning  algorithm:  which  set  of  restrictive  as¬ 
sumption  are  needed  and  which  can  be  lifted, 

•  the  control  knowledge  and  tuning  required  for  each  planning  do¬ 
main, 

•  the  size  of  the  problems  that  can  be  solve  in  a  reasonable  amount  of 
time  in  each  area  of  its  scope  (i.e.,  for  each  combination  of  relaxed 
assumptions  it  can  handle). 

Keywords:  automated  planning,  AI  planning,  performance 
measurement 

1.  Introduction 

Great  strides  have  been  made  in  automated  planning  dur¬ 
ing  the  past  few  years,  and  the  technology  is  becoming  ma¬ 
ture  enough  to  be  useful  in  a  variety  of  demanding  applications, 
ranging  from  controlling  space  vehicles  such  as  Deep  Space  1 
[6]  to  playing  the  game  of  bridge  [31].  Successes  such  as  these 
are  creating  a  great  potential  for  synergy  between  theory  and 
practice:  observing  what  works  well  in  practice  can  lead  to  bet¬ 
ter  theories  of  planning,  and  better  theories  can  lead  to  better 
performance  in  practical  applications. 

Despite  this  potential,  there  currently  is  a  substantial  gap 
between  theoretical  and  application-oriented  work.  The  theo¬ 
retical  work  tends  to  be  rather  narrow  in  scope,  focusing  on 
highly  restricted  cases  such  as  classical  planning,  with  the  most 
common  performance  measure  being  the  speed  of  the  planner’s 
combinatorial  search.  The  application-oriented  work  generally 
depends  on  ad  hoc  application-specific  programming  efforts, 
search  techniques,  and  measures  of  performance. 

For  most  planning  systems,  presentations  of  the  planning 
algorithm  may  discuss  some  of  the  assumptions  and  restrictions 
explicitly — but  usually  the  algorithm  will  also  depend  on  ad¬ 
ditional  assumptions  and  restrictions  that  are  tacit  in  the  repre- 
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Figure  1:  A  simple  conceptual  model  for  planning.  E  is  a  state- 
transition  system,  as  described  in  the  text. 


sentation  rather  than  explicit.  As  a  consequence,  it  is  often  very 
difficult  to  judge  whether  a  planning  algorithm  can  be  useful  for 
real-world  problem  solving,  and  it  is  often  even  more  difficult 
to  tell  whether  an  application-specific  planning  algorithm  can 
be  generalized  to  work  in  anything  other  than  the  specific  ap¬ 
plication  for  which  the  algorithm  has  been  written.  Better  ways 
are  needed  to  judge  the  scope  and  generalizability  of  planning 
algorithms  and  techniques. 

As  a  step  toward  meeting  that  need,  we  describe  a  general 
conceptual  model  for  planning,  and  use  it  to  classify  and  dis¬ 
cuss  the  kinds  of  restrictive  assumptions  that  are  often  made 
in  automated  planning  research.  We  believe  that  with  suitable 
refinement,  such  a  classification  will  provide  a  useful  perfor¬ 
mance  measure  for  automated  planning  algorithms,  by  provid¬ 
ing  a  way  to  give  a  clearer  account  of  what  restrictions  a  plan¬ 
ning  algorithm  requires. 

2.  Conceptual  Model  for  Planning 

Since  planning  is  concerned  with  choosing  and  organizing 
actions  for  changing  the  state  of  a  system,  a  conceptual  model 
for  planning  requires  a  general  model  for  a  dynamic  system. 
This  model,  shown  in  Figure  1,  includes  three  components: 

•  A  state-transition  system  E  that  evolves  as  specified  by  its 
state-transition  function  7,  according  to  the  events  and  ac- 
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tions  that  it  receives.  E  includes  a  set  S  of  states,  a  set  A 
of  actions,  a  set  E  of  events,  and  a  state-transition  function 

7  :  S  x  A  x  E  — >  2s. 

•  A  controller.  Given  as  input  the  state  s  of  the  system  (or  more 
generally,  some  observations  that  give  partial  knowledge  of 
the  current  state),  the  controller  provides  as  output  an  action 
a  according  to  some  plan. 

77  :  S  — >  O  that  maps  S  into  some  discrete  set  O  = 
{01, 02, . . .}  of  possible  observations.  The  input  to  the  con¬ 
troller  is  then  the  observation 

•  A  planner:  given  as  input  a  description  of  the  system  E,  an 
initial  situation  and  some  objective,  it  synthesizes  a  plan  for 
the  controller  in  order  to  achieve  the  objective. 

The  planner’s  objective  can  be  specified  in  several  different 

ways. 

1 .  The  simplest  specification  consists  of  a  goal  state  sg  or 
a  set  of  goal  states  Sg:  the  objective  is  achieved  by  any 
sequence  of  state  transitions  that  ends  at  one  of  the  goal 
states. 

2.  More  generally,  the  objective  is  to  satisfy  some  condition 
over  the  sequence  of  states  followed  by  the  system;  for 
example,  one  might  want  to  require  states  to  be  avoided, 
states  that  the  system  should  reach  at  some  point,  and 
states  that  it  should  stay  in. 

3.  An  alternative  specification  is  through  a  utility  function 
attached  to  states,  with  penalties  and  rewards,  the  goal 
being  to  optimize  some  compound  function  of  these  util¬ 
ities,  e.g.  sum  or  maximum,  over  the  sequence  of  states 
followed  by  the  system. 

4.  Another  alternative  is  to  specify  the  objective  as  tasks  that 
the  system  should  perform.  These  tasks  can  be  defined 
recursively,  as  sets  of  actions  and  other  tasks. 

3.  Restrictive  Assumptions 

The  conceptual  model  in  the  last  section  was  deliberately 

quite  general,  in  order  to  provide  a  starting  point  for  describing 

a  number  of  restrictive  assumptions: 

•  Assumption  AO  (Finite  E).  The  system  E  has  a  finite  set  of 
states. 

•  Assumption  A1  (Fully  Observable  E).  The  system  E  is 
fully  observable ,  i.e.,  one  has  complete  knowledge  about  the 
state  of  E;  in  this  case  the  observation  function  77  is  the  iden¬ 
tity  function. 

•  Assumption  A2  (Deterministic  E).  The  system  E  is  de¬ 
terministic ,  i.e.,  for  every  state  s  and  event  or  action  u, 
|7(s,  u)|  <  1.  If  an  action  is  applicable  to  a  state,  its  ap¬ 
plication  brings  a  deterministic  system  to  a  single  other  state. 
Similarly  for  the  occurrence  of  a  possible  event. 


•  Assumption  A3  (Static  E).  The  system  E  is  static,  i.e.,  the 
set  of  events  E  is  empty.  E  has  no  internal  dynamics;  it  stays 
in  the  same  state  until  the  controller  applies  some  action. 1 

•  Assumption  A4  (Attainment  Goals).  The  only  kind  of  goal 
is  an  attainment  goal,  which  is  specified  as  an  explicit  goal 
state  sg  or  a  set  of  goal  states  Sg .  The  objective  is  to  find 
any  sequence  of  state  transitions  that  ends  at  one  of  the  goal 
states.  This  assumption  excludes,  for  example,  states  to  be 
avoided,  constraints  on  state  trajectories,  and  utility  func¬ 
tions. 

•  Assumption  A5  (Sequential  Plans).  A  solution  plan  to  a 
planning  problem  is  a  linearly  ordered  finite  sequence  of  ac¬ 
tions. 

•  Assumption  A6  (Implicit  Time).  Actions  and  events  have 
no  duration,  they  are  instantaneous  state  transitions.  This  as¬ 
sumption  is  embedded  in  state-transition  systems,  a  model 
that  does  not  represent  time  explicitly. 

•  Assumption  A7  (Off-line  Planning).  The  planner  is  not 
concerned  with  any  change  that  may  occur  in  E  while  it  is 
planning;  it  plans  for  the  given  initial  and  goal  states  regard¬ 
less  of  the  current  dynamics,  if  any. 

The  simplest  case,  classical  planning,  combines  all  eight 
restrictive  assumptions:  complete  knowledge  about  a  determin¬ 
istic,  static,  finite  system  with  restricted  goals  and  implicit  time. 
Here  planning  reduces  to  the  following  problem: 

Given  E  =  (S,  A.  7),  an  initial  state  s 0  and  a 
subset  of  goal  states  Sg,  find  a  sequence  of  ac¬ 
tions  (ai,  02,  •  ■  • ,  Ofe)  corresponding  to  a  sequence 
of  state  transitions  (so,  si, . . . ,  Sk)  such  that  Si  G 
7(s0,ai),  s2  G  70i,a2),  ....  sfc  G  7(sfc_1,ofc), 
and  Sk  G  Sg. 

Since  the  system  is  deterministic,  if  7  is  applicable  to  s  then 
7(s,  a)  contains  one  state  s'.  To  simplify  the  notation,  we 
will  say  7 (s,a)  =  s'  rather  than  7 (s,a)  =  {s'}.  For  this 
kind  of  system,  a  plan  is  a  sequence  (oi,  a2, ... ,  a*,)  such  that 
7 (7(-  •  •  7(7(so,  or),  o2),  •  ■  • ,  Ofc-i),  Ofe)  is  a  goal  state. 

The  assumption  about  complete  knowledge  is  needed  only 
at  the  initial  state  s0,  because  the  deterministic  model  allows  all 
of  the  other  states  to  be  predicted  with  certainty.  The  plan  is 
unconditional,  and  the  controller  executing  the  plan  is  an  open- 
loop  controller,  i.e.,  it  does  not  get  any  feedback  about  the  state 
of  the  system. 

Classical  planning  may  appear  trivial:  planning  is  simply 
searching  for  a  path  in  a  graph,  which  is  a  well  understood  prob¬ 
lem.  Indeed,  if  we  are  given  the  graph  E  explicitly  then  there 
is  not  much  more  to  say  about  planning  for  this  restricted  case. 
However,  it  can  be  shown  [14]  that  even  in  very  simple  prob¬ 
lems,  the  number  of  states  in  E  can  be  many  orders  of  magni¬ 
tude  greater  than  the  number  of  particles  in  the  universe!  Thus 

'The  name  of  this  assumption  is  inaccurate,  because  the  plan  is  intended 
precisely  to  change  the  state  of  the  system.  What  the  name  means  is  that  the 
system  remains  static  unless  controlled  transitions  take  place. 


it  is  impossible  in  any  practical  sense  to  list  all  of  E’s  states 
explicitly.  This  establishes  the  need  for  powerful  implicit  rep¬ 
resentations  that  can  describe  useful  subsets  of  S  in  a  way  that 
both  is  compact  and  can  easily  be  searched. 

The  simplest  representation  for  classical  planning  is  a  set- 
theoretic  one:  a  state  s  is  represented  as  a  collection  of  propo¬ 
sitions,  the  set  of  goal  states  Sg  is  represented  by  specifying  a 
collection  of  propositions  that  all  states  in  Sg  must  satisfy,  and 
an  action  a  is  represented  by  giving  three  lists  of  propositions: 
preconditions  to  be  met  in  a  state  s  for  an  action  a  to  be  applica¬ 
ble  in  s,  propositions  to  assert  and  propositions  to  retract  from 
s  in  order  to  get  the  resulting  state  7(s,  a)  .  A  plan  is  any  se¬ 
quence  of  actions,  and  the  plan  solves  the  planning  problem  if, 
starting  at  so,  the  sequence  of  actions  are  executable,  producing 
a  sequence  of  states  whose  final  state  is  in  Sg. 

A  more  expressive  representation  is  the  classical  represen¬ 
tation :2  starting  with  a  function-free  first-order  language  L,  a 
state  s  is  a  collection  of  ground  atoms,  and  the  set  of  goal  states 
Sg  is  represented  by  an  existentially  closed  collection  of  atoms 
that  all  states  must  satisfy.  An  operator  is  represented  by  giving 
two  lists  of  ground  or  unground  literals:  preconditions  and  ef¬ 
fects.  An  action  is  a  ground  instance  of  an  operator.  A  plan  is 
any  sequence  of  actions,  and  the  plan  solves  the  planning  prob¬ 
lem  if,  starting  at  so,  the  sequence  of  actions  are  executable, 
producing  a  sequence  of  states  whose  final  state  satisfies  in  Sg. 
The  de  facto  standard  for  classical  planning  is  to  use  some  vari¬ 
ant  of  this  representation. 

4.  Classical  Planning  Versus  Planning 
Applications 

For  nearly  the  entire  time  that  automated  planning  has  ex¬ 
isted,  it  has  been  dominated  by  research  on  classical  planning. 
For  a  while,  the  dominance  was  so  complete  that  the  term 
“domain-independent  planning  system”  was  used  to  refer  to 
planning  systems  whose  scope  was  that  of  classical  planning, 
as  if  classical  planning  were  capable  of  representing  all  possi¬ 
ble  planning  domains. 

In  reality,  it  can  be  proved  [14,  Chapters  1-3]  that  classi¬ 
cal  planning  systems  are  restricted  to  a  very  narrow  class  of 
planning  domains.  This  class  excludes  most  problems  of  prac¬ 
tical  interest,  because  most  practical  planning  problems  do  not 
satisfy  the  restrictions  of  classical  planning.  Here  are  a  few  ex¬ 
amples: 

•  Process  planning  for  machined  parts.  Process  planning  is 
an  important  manufacturing  task,  and  many  millions  of  R&D 
dollars  have  been  spent  to  try  to  automate  it  [23],  The  state 
space  consists  of  the  possible  states  of  the  workpiece,  in¬ 
cluding  the  workpiece  geometry  and  various  other  parame¬ 
ters.  The  action  space  consists  of  the  possible  ways  to  mod¬ 
ify  the  workpiece  using  machining  operations.  Both  spaces 

2This  has  also  been  called  STRIPS-style  representation),  after  an  early  plan- 
ning  system  [27]  that  used  a  similar  representation  scheme. 


are  effectively  infinite  [17],  The  actions  have  nondeterminis- 
tic  outcomes  due  to  random  variations — but  in  process  plan¬ 
ning  the  outcomes  usually  are  approximated  deterministi¬ 
cally  by  the  use  of  machining  tolerances  [9],  The  planner 
must  consult  with  CAD  modelers  to  reason  about  the  work- 
piece  geometry,  and  must  query  databases  to  obtain  infor¬ 
mation  about  the  available  machines,  tooling,  fixturing,  and 
process  parameters.  With  the  exception  of  a  few  specialized 
process-planning  tasks  such  as  sheet-metal  bending  [16]  and 
NC  toolpath  generation  [28],  generative  process  planning 
tools  do  not  work  very  well  and  have  not  achieved  significant 
industrial  use.  By  far  the  most  widely  used  process-planning 
tools  are  those  that  provide  information  to  help  expert  hu¬ 
mans  do  the  process  planning.  Other  approaches,  e.g.,  [3,  8], 
illustrate  the  same  trend  for  planning  in  other  manufacturing 
applications. 

•  Planning  declarer  play  in  bridge.  At  the  beginning  of  play 
in  a  bridge  hand,  the  declarer  (the  player  who  chose  the  trump 
suit)  needs  to  develop  a  plan  for  how  to  play  the  hand.  The 
outcomes  of  the  declarer’s  actions  are  uncertain,  due  both  to 
uncertainty  about  how  the  opponents  will  respond  and  uncer¬ 
tainty  about  how  they  might  be  able  to  respond  (since  the  de¬ 
clarer  does  not  know  which  opponent  holds  which  cards).  A 
game  tree  containing  all  of  the  possibilities  would  have  about 
2.3  x  1024  leaf  nodes  on  the  average  and  about  5.6  x  1044 
in  the  worst  case  [30,  p.  226].  Since  most  bridge  games  are 
over  in  just  a  few  minutes,  it  would  not  be  feasible  to  explore 
any  significant  fraction  of  such  a  game  tree.  Instead,  tech¬ 
niques  have  been  developed  that  use  various  combinations 
of  game-tree  search,  Monte  Carlo  simulation,  and  reasoning 
about  possible  strategies  [12,  15,  31].  The  resulting  programs 
can  play  better  than  the  average  human  bridge  player,  but  not 
as  good  as  the  best  human  players. 

•  Ship-movement  planning.  Planning  the  movements  of  ships 
is  important  both  commercially  and  militarily  [11],  The  state 
space  and  action  space  are  effectively  infinite:  states  include 
positions  and  velocities  of  ships,  and  actions  correspond  to 
movements  of  the  ships  along  various  routes.  Since  move¬ 
ments  of  different  ships  may  occur  concurrently,  it  is  im¬ 
portant  to  make  sure  they  do  not  interfere  with  each  other. 
The  outcomes  and  durations  of  the  actions  cannot  be  known 
with  certainty,  because  of  factors  such  as  weather,  currents, 
and  the  behavior  of  the  ships’  operators.  Elaborate  sim¬ 
ulation  tools  are  available  to  aid  in  planning  ship  move¬ 
ments  but  the  planning  is  still  done  manually  [1].  Similarly, 
other  transportation-planning  applications,  such  as  for  rail¬ 
ways  [2],  have  focused  on  interactive  approaches  for  plan¬ 
ning. 

Many  other  examples  could  easily  be  cited;  see  for  example  the 

PLANET  repository’s  “Real-World  Planning  and  Scheduling 
page”  at  (http://vitalstatistix.nicve.salford.ac.uk/planet2). 


5.  Existing  Performance  Measures 

In  this  section,  we  do  a  quick  survey  of  existing  perfor¬ 
mance  measures,  and  draw  several  conclusions  about  the  lim¬ 
itations  of  those  measures. 

5.1.  Survey 

Performance  measures  for  classical  planners.  The  exis¬ 
tence  of  a  standard  representation  scheme  for  classical  plan¬ 
ning  has  made  it  relatively  easy  to  develop  large  collections  of 
planning  problems  on  which  different  planning  algorithms  can 
be  compared.  In  the  three  international  planning  competitions 
that  have  occurred  so  far  [24,  4,  22],  many  hundreds  of  classi¬ 
cal  planning  problems  have  been  generated,  from  about  fifteen 
different  planning  domains.3  The  most  common  performance 
measures  have  been  success  rate,  speed ,  and  solution  size,  i.e., 
the  fraction  of  problems  solved,  the  CPU  time  needed  to  solve 
them,  and  the  size  of  the  solution  found  (the  latter  two  are  nor¬ 
mally  measured  as  a  function  of  the  problem  size).  From  these 
measures,  one  can  get  a  rough  idea  of  the  size  of  the  problems 
that  a  planner  can  solve  in  a  reasonable  amount  of  time. 

A  partial  generalization.  The  2002  International  Planning 
Competition  [22]  included  several  collections  of  planning  prob¬ 
lems  that  did  not  satisfy  all  of  the  restrictions  of  classical  plan¬ 
ning.  In  these  problems.  Restrictions  AO,  A4,  and  A6  were 
weakened,  by  generalizing  the  planning  language  to  include  nu¬ 
meric  computations  and  optimization  goals.4 

Although  these  generalizations  may  seem  rather  modest, 
they  demonstrated  some  interesting  things  about  the  nature  of 
classical  planning,  as  discussed  below. 

For  each  of  the  planners  in  the  competition,  the  planning 
engine  was  problem-independent,  and  the  input  for  each  plan¬ 
ning  problem  included  the  initial  state,  the  goal  or  objective 
to  be  achieved,  and  the  set  of  operators  for  the  problem  do¬ 
main.  However,  the  planners  varied  in  terms  of  how  much  ad¬ 
ditional  knowledge  was  made  available  to  them  about  how  to 
solve  problems  in  the  planning  domain.  The  planners  in  the 
competition  can  be  classified  into  three  categories: 

•  Non-tunable  planners.  In  these  planning  systems,  the  prob¬ 
lem  input  consists  solely  of  the  information  specified  above: 
initial  state,  goal  or  objective,  and  operators.  In  the  compe¬ 
tition,  the  planners  in  this  class  included  most,  but  not  all, 
of  the  ones  that  Long  and  Fox  [22]  have  called  “fully  auto¬ 
mated”  planners. 

3  In  classical  planning,  a  domain  is  basically  a  set  of  planning  operators. 
For  each  domain  it  is  possible  to  produce  an  unlimited  number  of  randomly 
generated  problems  by  specifying  initial  and  goal  states. 

4In  the  2004  International  Planning  Competition,  which  was  in  progress  at 
the  time  that  we  wrote  this  paper,  some  of  the  restrictions  have  been  weakened 
further.  For  details,  see  (http://www-rcf.usc.edu/~skoenig/icaps/icaps04/ 
planningcompetition.html). 


•  Tunable  planners.  Although  these  planning  systems  have 
usually  been  classified  as  “fully  automated,”  there  are  ways 
to  tune  them  for  better  performance  in  a  given  planning  do¬ 
main.  In  the  2002  competition,  the  planners  in  this  class  in¬ 
cluded  LPG  [13]  and  FF  [18]. 5  For  LPG,  one  of  the  inputs 
was  a  setting  to  optimize  its  performance  for  speed,  quality, 
or  something  in  between,  and  LPG  was  run  with  all  three  set¬ 
tings  during  the  competition.  For  FF,  there  were  two  different 
versions,  both  of  which  were  entered  in  the  competition. 

•  Domain-configurable  planners.  These  are  planning  sys¬ 
tems  whose  input  includes  detailed  information  about  how 
to  solve  problems  in  the  relevant  problem  domain.  Such 
planners  have  sometimes  been  called  “hand-tailored”  plan¬ 
ners  [22],  but  that  term  is  not  accurate  since  the  planning  en¬ 
gine  is  domain-independent.  They  have  also  been  described 
as  “hand-tailorable”  [26]  or  “control-intensive”  [5]  planners. 
In  the  competition,  the  planners  of  this  type  included  SHOP2 
[26],  TLPlan  [5],  and  TALplanner  [19]. 

Performance  measures  for  application-specific  planners. 

For  application-specific  planning  systems,  usually  the  per¬ 
formance  measures  and  the  ways  of  testing  them  are  also 
application-specific.  For  example,  manufacturing-planning  sys¬ 
tems  are  tested  on  collections  of  manufacturing-planning  prob¬ 
lems  that  are  specific  to  the  particular  domain  in  which  the 
planning  is  done  (e.g.,  see  [29]);  and  in  computer  bridge  [31], 
there  are  annual  competitions  in  which  performance  is  mea¬ 
sured  by  playing  the  programs  against  each  other  on  a  set  of 
bridge  hands,  using  the  normal  rules  for  a  bridge  tournament. 
These  kinds  of  measures  are  useful  for  the  application  domain 
at  hand,  but  they  are  not  directly  generalizable  to  other  domains. 

5.2.  Observations 

From  the  survey  in  the  previous  section,  we  can  make  the 
following  observations. 

Observation  1:  There  is  a  tradeoff  between  the  amount  of 
work  needed  to  configure  a  planner  for  a  domain,  and  plan¬ 
ner’s  speed  and  coverage  of  that  domain  once  it  has  been  so 
configured.  Here  are  several  examples: 

•  In  the  planning  competitions,  the  non-tunable  planners  were 
the  ones  that  had  the  highest  running  time  and  solved  the 
fewest  planning  problems — but  configuring  a  non-tunable 
planner  requires  no  workwhatsoever,  provided  that  the  plan¬ 
ner  is  capable  of  representing  the  planning  domain. 

•  In  the  planning  competitions,  the  tunable  planners  were 
faster  than  the  fully  automated  ones.  However,  some  exper¬ 
imentation  may  be  required  to  find  the  settings  that  give  the 
best  overall  performance. 

5  Some  of  the  other  planners  in  the  competition  may  also  be  capable  of  being 
tuned,  but  LPG  and  FF  were  the  only  ones  for  which  results  were  submitted 
using  more  than  one  setting  or  version. 


•  In  the  planning  competitions,  the  domain-configurable  plan¬ 
ners  solved  planning  problems  several  orders  of  magnitude 
faster  than  the  others,  and  solved  many  problems  that  were 
too  large  for  the  other  planners  to  solve.  However,  the 
domain-configurable  planners  require  a  significant  amount  of 
up-front  work  to  formulate  the  domain-specific  knowledge 
that  enables  them  to  run  so  quickly,  and  this  work  must  be 
redone  each  time  one  switches  to  a  new  domain. 

•  In  order  to  get  top-level  performance  in  a  specific  application 
domain,  it  may  be  necessary  to  develop  a  domain-specific 
planner.6  However,  developing  and  tuning  such  planners  may 
require  years  of  work.  The  resulting  planning  system  may  be 
quite  good  for  its  particular  application  domain,  but  cannot 
be  used  to  solve  problems  in  any  other  domain. 

Observation  2:  Performance  in  classical  planning  domains 
does  not  predict  performance  in  other  planning-competition  do¬ 
mains.  For  example: 

•  Some  of  the  planning  systems  were  designed,  sometimes 
consciously  and  sometimes  tacitly,  with  classical  planning 
in  mind.  These  planners  did  well  on  classical  domains,  but 
on  non-classical  domains  they  did  not  perform  very  well  (if 
they  could  be  used  at  all). 

•  On  the  other  hand,  some  of  the  planning  systems  were  de¬ 
signed,  from  the  ground  up,  to  work  on  non-classical  plan¬ 
ning  domains.  These  systems  generally  performed  well  on 
both  the  classical  and  non-classical  domains. 

Observation  3:  Performance  in  planning-competition  do¬ 
mains  does  not  predict  performance  in  real-world  application. 
For  example: 

•  Most  of  the  planning  systems  in  the  competition,  including 
both  good  and  bad  performers,  would  not  be  directly  us¬ 
able  in  real-world  applications,  because  of  restrictions  on  the 
kinds  of  planning  problems  that  they  can  solve. 

•  A  planner  that  performed  poorly  in  the  2002  planning  com¬ 
petition,  IxTetT  [20],  is  used  quite  successfully  for  the  appli¬ 
cation  of  robot  motion  planning  [21],  a  domain  which  most 
of  the  systems  in  the  competition  would  be  unable  to  address. 

•  One  of  the  best  performers  in  the  2002  planning  competition, 
SHOP2  [26],  is  also  proving  useful  in  several  application  ar¬ 
eas.  It  is  developing  a  user  base  that  includes  universities, 
companies  such  as  Sony,  Lockheed  Martin,  and  SIFT,  and 
government  laboratories  such  as  NIST  and  NRL. 

From  the  above  observations,  we  conclude  that  it  is  not  ad¬ 
equate  merely  to  measure  running  time  and  percentage  of  prob¬ 
lems  solved.  Such  figures  are  not  meaningful  unless  one  also 

6Some  examples  of  such  systems  include  Bridge  Baron  for  computer  bridge 
[31],  the  Intelligent  Bending  Workstation  for  sheet-metal  bending  [16],  and 
RAX  for  autonomous  spacecraft  control  [25], 


knows  the  class  of  planning  problems  over  which  such  perfor¬ 
mance  can  be  achieved,  and  how  much  the  performance  will  be 
degraded  on  broader  classes  of  planning  problems. 

6.  A  Proposed  Performance  Measure 

In  this  section,  we  discuss  three  different  aspects  of  a  plan¬ 
ning  system’s  performance  that  we  believe  are  important  to 
measure:  the  scope  of  the  problems  that  the  planner  can  solve, 
the  amount  and  kind  of  control  knowledge  that  must  be  given 
to  the  planning  system,  and  the  size  of  the  problems  that  the 
planning  system  can  reasonably  solve. 

6.1.  Problem  Scope 

We  believe  that  any  useful  measure  of  performance  for  a 
planning  system  needs  to  include  the  scope  of  the  problems  that 
the  corresponding  planning  algorithm  is  capable  of  solving.  The 
set  of  restrictive  assumptions  in  Section  2  can  be  used  as  a  basis 
for  defining  what  this  scope  is.  More  specifically: 

•  Relaxing  Assumption  A0  (Finite  E).  An  enumerable,  pos¬ 
sibly  infinite  set  of  states  may  be  needed,  for  example,  to 
describe  actions  that  construct  or  bring  new  objects  in  the 
world,  or  to  handle  numerical  state  variables.  This  brings  in 
some  theoretical  issues  about  decidability  and  termination. 

•  Relaxing  Assumption  A1  (Fully  Observable  E).  If  we  al¬ 
low  a  static,  deterministic  system  to  be  partially  observable, 
then  the  observations  of  E  will  not  fully  disambiguate  which 
state  E  is  in.  For  each  observation  o,  there  may  be  more  than 
one  state  s  such  that  rj(s)  =  o.  Without  knowing  which  state 
in  ?7 ” 1  ( o)  is  the  current  state,  it  is  no  longer  possible  to  pre¬ 
dict  with  certainty  whether  an  action  is  applicable  and  what 
state  E  will  be  in  after  each  action. 

•  Relaxing  Assumption  A2  (Deterministic  E).  In  a  static  but 
nondeterministic  system,  each  action  can  lead  to  different 
possible  states,  so  the  planner  may  have  to  consider  alterna¬ 
tives.  Usually  nondeterminism  requires  relaxing  Assumption 
A5  as  well.  A  plan  must  encode  ways  for  dealing  with  alter¬ 
natives,  e.g.,  conditional  constructs  of  the  form  “do  a  and, 
depending  on  its  result,  do  either  b  or  c”,  and  iterative  con¬ 
structs,  like  “do  a  until  a  given  result  is  obtained.”  Notice  that 
the  controller  has  to  observe  the  state  s:  here  we  are  planning 
for  a  closed-loop  control. 

If  the  complete  knowledge  assumption  (Assumption  Al)  is 
also  relaxed,  this  leads  to  another  difficulty:  the  controller 
does  not  know  exactly  the  current  state  s  of  the  system  at 
run-time.  A  limiting  case  is  null  observability ,  where  no  ob¬ 
servations  at  all  can  be  done  at  run-time.  This  leads  to  a 
particular  case  of  planning  for  open-loop  control  called  con¬ 
formant  planning. 

Some  ways  of  dealing  with  nondeterminism  are  extensions  of 
techniques  used  in  classical  planning  (such  as  Graph-based 
or  SAT-based  planning),  while  others  are  designed  specifi¬ 
cally  to  deal  with  nondeterminism,  such  as  planning  based 


on  Markov  Decision  Processes  (MDPs)  [7,  14]  or  model¬ 
checking  techniques  [10,  14]. 

•  Relaxing  Assumption  A3  (Static  E).  We  can  easily  deal 
with  a  dynamic  system  E  if  it  is  deterministic  and  fully  ob¬ 
servable,  and  if  we  further  assume  that  for  every  state  s  there 
is  at  most  one  contingent  event  e  for  which  7 (s,  e)  is  not 
empty,  and  that  e  will  necessarily  occur  in  s.  Such  a  sys¬ 
tem  can  be  mapped  into  the  restricted  model:  one  redefines 
the  transition  for  an  action  a  as  7(7(5,  a),  e),  where  e  is  the 
event  that  occurs  in  the  state  7(s,  a). 

In  the  general  model  of  possible  events  that  may  or  may  not 
occur  in  a  state  and  “compete”  with  actions,  a  dynamic  sys¬ 
tem  is  nondeterministic  from  the  view  point  of  the  planner 
even  if  7(.s,m)  <  1,  u  being  either  an  action  or  an  event. 
Deciding  to  apply  action  a  in  s  does  not  focus  the  planner’s 
prediction  to  a  single  state-transition.  Here  again,  a  condi¬ 
tional  plan  will  be  needed. 

•  Relaxing  Assumption  A4  (Restricted  Goals).  Controlling 
a  system  may  require  more  complex  objectives  than  reaching 
a  given  state.  One  would  like  to  be  able  to  specify  to  the  plan¬ 
ner  an  extended  goal  with  requirements  not  only  on  the  final 
state  but  also  on  the  states  traversed,  e.g.,  critical  states  to  be 
avoided,  states  that  the  system  should  go  through,  states  it 
should  stay  in  and  other  constraints  on  its  trajectories.  It  may 
also  be  desirable  to  have  utility  functions  to  be  optimized, 
e.g.,  to  model  a  system  that  must  function  continuously  over 
an  indefinite  period  of  time. 

•  Relaxing  Assumption  A5  (Sequential  Plans).  Here,  a  plan 
may  be  a  mathematical  structure  that  can  be  richer  than  a 
simple  sequence  of  actions.  As  examples,  one  may  consider 
a  plan  to  be  a  partially  ordered  set,  a  sequence  of  sets,  a  con¬ 
ditional  plan  that  forces  alternate  routes  depending  on  the 
outcome  and  current  context  of  execution,  a  “universal  plan” 
or  a  “policy”  that  maps  states  to  appropriate  actions,  or  a 
deterministic  or  nondeterministic  automaton  that  determines 
what  action  to  execute  depending  on  the  previous  history  of 
execution.  Relaxing  Assumption  A5  is  often  required  when 
other  assumptions  are  relaxed,  as  we  have  seen  in  the  case  of 
nondeterministic  systems  (Assumption  A3)  or  when  relaxing 
Assumptions  Al,  A3,  A4  and  A6.  Plans  as  partially  ordered 
sets,  or  as  sequences  of  sets  of  actions,  are  more  easily  han¬ 
dled  than  conditional  plans  and  policies. 

•  Relaxing  Assumption  A6  (Implicit  Time).  In  many  plan¬ 
ning  domains,  action  duration  and  concurrency  have  to  be 
taken  into  account.  Time  can  also  be  needed  for  express¬ 
ing  temporally  constrained  goals  and  occurrence  of  events 
with  respect  to  an  absolute  time  reference.  However,  time  is 
abstracted  away  in  the  state-transition  model.7  This  concep¬ 
tual  model  considers  actions  or  events  as  instantaneous  tran¬ 
sitions:  at  each  clock  tick,  the  controller  synchronously  reads 
the  observation  for  the  current  state  (if  needed)  and  applies 

7Other  formalisms,  such  as  timed  automata,  extend  state-transition  systems 

by  incorporating  an  explicit  representation  of  time. 


the  planned  action. 

•  Relaxing  Assumption  A7  (Offline  Planning).  The  control 
problem  of  driving  a  system  towards  some  objectives  has  to 
be  handled  online  with  the  dynamics  of  that  system.  While 
a  planner  may  not  have  to  worry  about  all  the  details  of  the 
actual  dynamics,  it  cannot  ignore  completely  how  the  system 
will  evolve.  At  the  least,  it  needs  to  check,  online,  whether 
a  solution  plan  remains  valid,  and,  if  needed,  to  revise  it  or 
replan.  Other  approaches  consider  planning  as  a  process  that 
modifies  the  controller  online. 

For  a  detailed  presentation  of  techniques  for  solving  planning 
problems  with  various  combinations  of  these  restrictions,  see 
[14]. 

6.2.  Control  Knowledge 

Another  important  aspect  of  a  planning  system’s  perfor¬ 
mance  is  what  kind  of  additional  control  knowledge  (other  than 
just  the  problem  definition)  will  need  to  be  given  to  the  plan¬ 
ning  system  in  order  for  it  to  address  practical  problems.  This 
includes,  for  example,  whether  the  planner  needs  such  knowl¬ 
edge,  how  precise  and  specific  to  a  problem  the  knowledge 
needs  to  be,  whether  the  planner  needs  to  be  fine-tuned  for  dif¬ 
ferent  planning  domains,  and  how  easily  this  knowledge  can  be 
acquired  and  formalized.  It  would  be  quite  difficult  to  express 
this  feature  in  precise  quantified  measurements,  but  a  qualitative 
assessment  of  this  feature  can  be  made,  on  the  basis  of  a  small 
set  of  predefined  classes  ranging  from  planners  that  require  no 
control  knowledge  to  those  that  require  the  domain  author  to  do 
some  highly  demanding  algorithm  development. 

6.3.  Problem  Size 

A  third  important  aspect  of  performance  is  what  size  of 
problem  a  planning  system  can  reasonably  solve.  For  this  per¬ 
formance  aspect,  the  traditional  measures  have  been  numeric 
ones,  along  the  lines  of  “this  planner  can  solve  problems  of  size 
n  in  time  t”  for  various  values  of  n  and  t.  This  has  typically 
been  measured  by  running  the  planner  on  a  randomly  generated 
set  of  planning  problems. 

Such  a  performance  measure  has  an  obvious  appeal,  but  as 
we  concluded  in  the  preceding  section,  it  also  has  an  important 
limitation:  it  is  highly  biased  by  theset  of  benchmark  problems 
on  which  the  planner  is  tested.  If  a  planning  system  can  solve 
“toy  problems”  in  which  the  solution  plans  contain  hundreds 
or  even  thousands  of  actions,  this  does  not  necessarily  say  any¬ 
thing  about  how  well — or  even  whether — the  system  can  solve 
more  useful  classes  of  planning  problems. 

A  more  useful  way  of  measuring  performance  would  be  to 
use  several  classes  of  problems,  ranging  in  scope  from  toy  prob¬ 
lems  to  very  demanding  applications,  and  measure  performance 
in  each  class. 


6.  Conclusion 

In  this  paper,  we  have  described  existing  performance  mea¬ 
sures  for  automated  planning  algorithms,  and  have  discussed 
the  limitations  and  biases  inherent  in  those  performance  mea¬ 
sures.  We  have  pointed  out  the  importance  of  developing  a  per¬ 
formance  measure  that  explicitly  the  restrictive  assumptions  on 
which  a  planning  algorithm  depends — and  as  initial  step  toward 
such  a  performance  measure,  we  have  defined  and  discussed  a 
list  of  restrictive  assumptions  that  are  common  to  most  auto¬ 
mated  planning  systems.  We  believe  that  this  list  provides  an 
initial  step  toward  developing  a  taxonomy  of  restrictions  that 
can  be  used  to  measure  the  scope  of  planning  algorithms. 

Based  on  the  above  considerations,  we  have  proposed  a 
composite  performance  measure  based  on  three  factors: 

•  the  scope  of  the  planning  algorithm:  which  set  of  restrictive 
assumption  are  needed  and  which  can  be  lifted, 

•  the  control  knowledge  and  tuning  required  for  each  planning 
domain, 

•  the  size  of  the  problems  that  can  be  solve  in  a  reasonable 
amount  of  time  in  each  area  of  its  scope  (i.e.,  for  each  com¬ 
bination  of  relaxed  assumptions  it  can  handle). 

Several  aspects  of  this  performance  measure  are  not  yet  (or  not 
yet  fully)  developed,  and  we  hope  that  this  paper  will  encourage 
researchers  to  make  the  effort  needed  to  develop  them. 
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